A novel LASSO‐derived prognostic model predicting survival for non‐small cell lung cancer patients with M1a diseases

Abstract Introduction The current American Joint Committee on Cancer (AJCC) M1a staging of non‐small cell lung cancer (NSCLC) encompasses a wide disease spectrum, showing diverse prognosis. Methods Patients who diagnosed in an earlier period formed the training cohort, and those who diagnosed thereafter formed the validation cohort. Kaplan–Meier analysis was performed for the training cohort by dividing the M1a stage into three subgroups: (I) malignant pleural effusion (MPE) or malignant pericardial effusion (MPCE); (II) separate tumor nodules in contralateral lung (STCL); and (III) pleural tumor nodules on the ipsilateral lung (PTIL). Gender, age, histologic, N stage, grade, surgery for primary site, lymphadenectomy, M1a groups, and chemotherapy were selected as independent prognostic factors using the least absolute shrinkage and selection operator (LASSO) Cox regression analysis. And a nomogram was constructed using Cox hazard regression analysis. Accuracy and clinical practicability were separately tested by Harrell's concordance index, the receiver operating characteristic (ROC) curve, calibration plots, residual plot, the integrated discrimination improvement (IDI), net reclassification improvement (NRI), and decision curve analysis (DCA). Results The concordance index (0.661 for the training cohort and 0.688 for the validation cohort) and the area under the ROC curve (training cohort: 0.709 for 1‐year and 0.727 for 2‐year OS prediction; validation cohort: 0.737 for 1‐year and 0.734 for 2‐year OS prediction) indicated satisfactory discriminative ability of the nomogram. Calibration curve and DCA presented great prognostic accuracy, and clinical applicability. Its prognostic accuracy preceded the AJCC staging with evaluated NRI (1‐year: 0.327; 2‐year: 0.302) and IDI (1‐year: 0.138; 2‐year: 0.130). Conclusion Our study established a nomogram for the prediction of 1‐ and 2‐year OS in patients with NSCLC diagnosed with stage M1a, facilitating healthcare workers to accurately evaluate the individual survival of M1a NSCLC patients. The accuracy and clinical applicability of this nomogram were validated.


| INTRODUCTION
Lung cancer, a prevalent malignancy, is the leading cause of global cancer-associated mortalities. 1 The main pathological types of lung cancer are classified as non-small cell lung cancer (NSCLC) as well as small-cell lung cancer (SCLC), accounting for 85% and 15% of all lung cancer, respectively. 2 At initial diagnosis, many NSCLC patients are characterized by malignant pleural effusion (MPE) or malignant pericardial effusion (MPCE). 3 These patients are categorized as M1a stage based on the seventh edition American Joint Committee on Cancer (AJCC) tumornode-metastasis (TNM) staging system. Besides MPE and MPCE, separate tumor nodules in contralateral lung (STCL) and pleural tumor nodules on ipsilateral lungs (PTIL) are also included in M1a. 4 Interestingly, several studies found diverse prognosis among different metastatic sites in M1a patients, 5 with median overall survival ranging from 3-8 months, 6,7 suggesting a high heterogeneity within the M1a stage.
The existing AJCC staging system is only dependent on tumor size (T), the presence or absence of nodal status (N), and metastasis (M), lacking an evaluation of clinicopathologic characteristics including age, gender, histology, metastatic organ, the number of metastatic sites, as well as modality of treatment, 8 which could also affect the prognosis. It is obvious that the TNM system is not sufficient for predicting outcome in an individual patient. Accurate risk stratification allows patients and physicians to better balance pros and cons while making decisions. 9 Therefore, a more accurate and comprehensive tool is needed.
Nomogram, a visual risk regression model, is an ideal tool for the prediction of patients' prognostic outcomes. 10 Various nomograms have been constructed for the prognostic prediction of metastatic NSCLC patients, such as with distant organ metastasis, 11 with MPE or MPCE. 12 In 2020, there was a study created a nomogram focus on M1a NSCLC patients. 13 Several clinical characteristics and clinicopathological variables were included in the nomogram. Nevertheless, the metastatic site, which has been demonstrated to influence overall survival in M1a NSCLC patients 14 and should be considered for predicting individualized prognosis, was not included as a predictor. Thus, there still absent efficient model to predict the survival of NSCLC patients with various M1a descriptor. Herein, we aimed at establishing a novel nomogram to assess relevant risk factors and estimate overall survival and provide a satisfying prognostic indication to NSCLC patients initially diagnosed with different metastatic sites in the M1a stage.

K E Y W O R D S
nomogram, non-small cell lung cancer, prognostic, SEER laterality, primary site, seventh edition AJCC system T and N stage, M1a group, surgery for primary site, survival time [months], lymphadenectomy, surgical resection of metastatic lesions, radiation, and chemotherapy), and vital status recode were collected from the SEER database. Age as a continuous variable was separated into two groups (< 60 years and ≥ 60 years). Treatment with surgery and radiotherapy were separated into two categories ("No" or "Yes"), while chemotherapy was separated into "No/Unknown" or "Yes". Overall survival (OS) was defined as the time between the date of diagnosis and death of any cause or the last follow-up. OS was chosen as the primary outcome. The survival difference among each descriptor in the M1a stage was evaluated using Log-rank test. Depending on the results of the Log-rank test, we then reclassified M1a into three subgroups (MPE/MPCE, STCL, and PTIL).

| Statistical analysis
To develop the nomogram and for further external validation, eligible patients who diagnosed between 2010 and 2013, were included into the training cohort, and those between 2014 and 2015 were entered into the validation cohort. 16,17 Descriptive analyses of demographic as well as clinicopathological characteristics of included study patients in the training as well as validation cohorts, and the median survival time with a 95% confidence interval (CI) for each subgroup was calculated using the Kaplan-Meier survival analysis. Categorical variables were compared using the chi-squared test.
As least absolute shrink and selection operator (LASSO) Cox regression can effectively avoid redundancy or overfitting that occurs in significant feature selection, 18,19 we used this regression model in a training cohort to identify independent risk factors that affect OS. Along with an increase of a penalty factor (λ), the coefficients of the respective variables decrease. When the λ is the optimal, the coefficients of some variables are compressed to 0, at which point the variables that retains non-zero are the final selected variable. Fivefold cross-validation was used to determine optimal LASSO penalty. The resulting variables were included into the nomogram. The nomogram adopted a 1-and 2-year OS as the endpoint. To determine whether the nomogram could distinguish between patients exhibiting dissimilar outcomes, discrimination for internal validation in the training cohort as well as external validation in the validation set was evaluated using Harrell's concordance index (C-index) with a 95% CI, the receiver operating characteristic (ROC) curve and the area under the curve (AUC). 20,21 The C-index is a value between 0.5 and 1, with 0.5 indicating that the model is completely random and 1 indicating that the model has perfect predictivity. We developed calibration plots for both the training and validation cohorts according to a fivefold cross-validation and 1,000 bootstrap resamples to establish the concordance between predicted as well as observed 1-and 2-year OS outcomes to assess the nomogram's predictive accuracy. The goodness-of-fit of the LASSO Cox regression model was illustrated by the Cox-Snell residual plot. 22 The model's reliability was examined using decision curve analysis (DCA), which has unique advantages in assessing the clinical benefit and utility of nomograms. 23 Comparison of the nomogram and 7th edition AJCC TNM staging system was done with an integrated discrimination improvement (IDI) and net reclassification improvement (NRI).

| Baseline clinicopathological features
We finally collected a total of 4,749 cases. A specific screening flowchart is shown in Figure 1. The whole cohort was entered into two groups, where 3,238 and 1,511 cases were included in the training and validation cohorts, respectively. Demographic as well as clinicopathological characteristics of patients and their OS (95% CI) are presented in Table 1. In the training cohort, ADC accounted for the highest proportion, with 1,689 (52.16%) patients having ADC, whereas those with squamous cell carcinoma and other non-small cell carcinoma were 1,181 (36.47%) and 368 (11.37%), respectively. A total of 1,920 (59.30%) patients had chemotherapy and 1,318 (40.70%) patients had no chemotherapy or chemotherapy status was unknown. The 1-year survival rate was 36.9% for MPE, 29.2% for MPCE, 46.7% for STCL, and 49.9% for PTIL. The 2-year survival rate for MPE, MPCE, STCL, and PTIL was 18.7%, 16.3%, 24.8%, and 31.2%, respectively. As shown in Figure 2

| Independent prognostic factors selection
A total of 17 variables were included in the LASSO Cox regression. After LASSO Cox regression (Figure 3), nine variables with nonzero coefficients remained significant predictors of OS, including: gender, age, histology, grade, N stage, M1a stage, surgery for primary site, lymphadenectomy, and chemotherapy.

| Nomogram establishment and validation
A nomogram was constructed with a basis on the resulting nine variables, and each subgroup within these variables was allocated a score ( Table 2). The points from the various variables were summed to obtain a total point, and the predicted 1-as well as 2-year survival probabilities were obtained by plotting the vertical lines from the total point's axis to the two outcome axes ( Figure 4). C-index of the nomogram was 0.661 (95% CI: 0.650-0.672) in the training cohort and 0.688 (95% CI: 0.671-0.704) in the validation cohort. In the training cohort, the AUC for 1-year OS was 0.709 and for 2-year was 0.727 ( Figure 5(A)). And in the validation cohort, the AUC for 1-as well as 2-year OS was 0.737, 0.734, respectively ( Figure 5(B)). Calibration for 1-as well as 2-year OS outcomes exhibited a satisfactory agreement between the estimated and actual survival outcomes in both the training and validation cohorts ( Figure 6(A), (B)), the Cox-Snell residual plot also showed a good fitness of our nomogram (Figure 6(C)). Therefore, the nomogram has considerable discriminative as well as calibration abilities.

| Comparison of the nomogram & 7th edition AJCC TNM staging system
We performed DCA using a validation cohort to assess the clinical utility of our model and AJCC TNM staging. As shown in Figure 7, the DCA curves indicate that the nomogram has better clinical applicability in predicting 1-and 2-year outcomes for M1a NSCLC patients. Compared with AJCC system, our model displays higher net benefit at a threshold probability around 0.19 or more for 1-year outcomes and 0.46 or more for 2-year outcomes, respectively.
Accuracy analysis showed that the NRI for 1-year prognosis of the new model in the validation set was 0.327 (95% CI: 0.277-0.379), and for 2-year prognosis was 0.302 (95% CI: 0.220-0.388). Similarly, the IDI for 1-and 2-year prognosis of the new model in the validation set was 0.138 (P < 0.001), 0.130 (P < 0.001), respectively. In conclusion, the nomogram showed a superior predictive ability when compared with the original AJCC staging model.

| DISCUSSION
As a common solid tumor, NSCLC often presents distant metastasis late in the course of the disease, where over 15% of the patients present with an M1a stage. 3,24 The AJCC TNM staging system is currently used for prognostic prediction, since accurate prediction of survival can help physicians choose appropriate treatment. 25 However, given that in addition to tumor stage, various high-risk factors affect OS, reliable prognostication in M1a NSCLC patients has not been an exact science and is an unmet need. It has been shown that nomograms can provide more accurate and individualized prediction 26 and can also visualize influencing factors. 27 Herein, using a large patient cohort from the SEER database, we constructed a novel nomogram to obtain a more accurate prediction of survival for individual M1a patients. This nomogram encompasses readily accessible as well as impartial baseline clinicopathologic factors including gender, age, histology, N stage, M1a subgroup, grade, surgery of primary site, lymphadenectomy, as well as chemotherapy. Studies have documented various M1a NSCLC prognostic algorithms that use a few baseline variables. Yin et al. 13 established an M1a prognostic nomogram (C-index for OS: 0.710; C-index for cancerspecific survival: 0.723) by propensity score matching to ease the influence of confounding variables. Based on the M1a NSCLC cohort, in which 5,976 patients had not been subject to surgery and 386 individuals underwent surgery, they found that tumor resection provided better prognosis. However, the metastatic site in M1a NSCLC was not included in their survival nomogram. Tian and colleagues 12 published a nomogram (C-index for OS: 0.772) base on MPE or MPCE, in which ipsilateral MPE indicated the better prognosis than other effusion. Our nomogram, which was established using various characteristics and a larger sample size, is the first prognostic nomogram for NSCLC patients of all metastatic sites in M1a staging. This nomogram has functional enhancements when compared to previously prognostic M1a models.
Tumor grade is recognized as an important prognostic marker and, interestingly, was not incorporated into this risk model. A possible reason is that tumor grade might be correlated with various factors in our model and that these factors could be very efficient. Another reason might be T A B L E 1 (Continued) that our model was established with a focus on a specific subgroup, which is M1a patients. Due to their heterogeneity, metastatic tumor cells are more aggressive, which makes primary tumor grades to be less important in prognostic predictions. 12 Whether surgery is necessary for patients with M1a stage NSCLC remains controversial. 28,29 Theoretically, surgery can completely remove the tumor foci, reduce the tumor burden and alleviate tumor-caused complications compared with radiotherapy and chemotherapy. With survival analyses of treatments based on our data, we found that primary site surgery would indicate better OS compared with non-surgery. It has been well documented that surgery of primary lesions in M1a NSCLC patients can remarkably enhance prognosis. 13,14 Shen et al. 30 similarly reported that compared with the M1b stage, patients with an M1a stage had remarkably improved outcomes undergoing surgery.
Chemotherapy, as an important part of multimodality therapy, is vital for the prognostic prediction of M1a NSCLC patients. It is also evident from our nomogram that chemotherapy is involved in the prognosis of M1a + + + + + + + + + + + + + + + + + + + + + + + + ++ +++ + ++ ++ + + + + + ++ + + + + + + + + + + + ++ ++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +++ + + + + + + ++ + + + + +++ + + ++ + + + + + + + + + + + + ++ + + +++ + + + + + + + + ++ +  reported that patients subjected to chemotherapy combined with surgery had a significantly better prognosis than those who received chemotherapy alone or chemoradiotherapy combined with surgery. 31 Due to the limited information available in the SEER database, targeted therapy was not incorporated into this study. Targeted therapies have been shown to provide significant clinical benefit in patients with advanced lung cancer. 32,33 A meta-analysis by Liu and colleagues 34 found that targeted therapy in combination with chemotherapy prolonged the progressionfree survival (PFS) compared with chemotherapy alone (hazard ratio, 0.82; 95% CI: 0.78-0.87). Radiotherapy, as another important modality for tumor treatment, was not included during the variable screening process, indicating that radiotherapy has a minimal prognostic impact in M1a NSCLC patients. However, as a method for palliative treatment, radiotherapy can play a role in alleviating patient suffering and controlling tumor progression. 35 Among all the pathological types of NSCLC patients, ADC has the best prognosis, which was the same as in previous studies. 2,11,12 This may be because ADC exhibits extra EGFR gene mutations, which makes ADC to be more sensitive to EGFR-TKIs, so that patients can benefit from anti-EGFR regimens. Among the M1a descriptors, MPE or MPCE suggested a poorer prognosis. Previous study also found that patients with MPE or MPCE had poor survival outcomes. 6 Therefore, whether to subclassify this heterogeneous patient population still needs to be considered. Overall, our nomogram contains reasonable factors that can effectively predict the prognosis of different M1a NSCLC patients. The established nomogram is a more precise prognostic model when compared to the TNM staging system.
It is important to point out that this study has several limitations. First, this is a retrospective study, some clinicopathological variable including body mass index, smoking status, serum markers, the usage of EGFR-TKIs, detailed regimens of chemotherapy, 34,36 as well as molecular markers, that might improve model accuracy, were not included in the current study since these data are not provided by the SEER database. Second, only the prognosis of a single metastatic site, that is, M1a subgroup, was analyzed, and no information was provided for cases with a subgroup combination, such as "STCL + PTIL", which leads to some limitations in our nomogram in the clinical assessment of patient prognosis. Third, our results showed that PTIL had a better prognosis than STCL and MPE/MPCE, but the number as well as the location of pleural nodules were not recorded in the SEER database in detail, which may have implications for the analysis. Fourth, all patients in this study were grouped according to the seventh edition of the AJCC-TNM staging system, however, coding rules on tumor extension made it difficult to restage the patients based on the latest eighth edition of the TNM classification. 11 In addition, our nomogram is only based on data for USA patients, and thus, is not representative of global patients. Therefore, studies using global prospective data, the latest TNM classification system as well as comprehensive prognostic factors should be performed to improve our model.

CONFLICT OF INTEREST
The authors have no conflict of interest.

AUTHOR CONTRIBUTIONS
Hongchao Chen: Study concept and design, writing (original draft), data collection, analyses, and interpretation. Chen Huang: Funding for publication, writing (review and editing), data collection, analyses, and interpretation. Huiqing Ge: Data collection, analyses, and interpretation. Qianshun Chen: Data collection, analyses, and interpretation. Jing Chen: Data collection, analyses, and interpretation. Yuqiang Li: Data collection, analyses, and interpretation. Haiyong Chen: Data collection, analyses, and interpretation. Shiyin Luo: Data collection, analyses, and interpretation. Lilan Zhao: Study concept and design, writing (review and editing), study supervision, study concept and design, and project administration. Xunyu Xu: Funding acquisition, study supervision, study concept and design, and project administration. All authors read and approved the final manuscript.